Comparison of LZ77-type Parsings

نویسندگان

  • Dmitry Kosolobov
  • Arseny M. Shur
چکیده

We investigate the relations between different variants of the LZ77 parsing existing in the literature. All of them are defined as greedily constructed parsings encoding each phrase by reference to a string occurring earlier in the input. They differ by the phrase encodings: encoded by pairs (length + position of an earlier occurrence) or by triples (length + position of an earlier occurrence + the letter following the earlier occurring part); and they differ by allowing or not allowing overlaps between the phrase and its earlier occurrence. For a given string of length n over an alphabet of size σ, denote the numbers of phrases in the parsings allowing (resp., not allowing) overlaps by z (resp., ẑ), for “pairs”, and by z3 (resp., ẑ3), for “triples”. We prove the following bounds and provide series of examples showing that these bounds are tight: • z ≤ ẑ ≤ z · O(log n z log σ z ) and z3 ≤ ẑ3 ≤ z3 ·O(log n z3 logσ z3 ); • 1 2 ẑ < ẑ3 ≤ ẑ and 1 2z < z3 ≤ z.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

15418 Parallel Computer Architecture and Programming Project Report: Implementation and Comparison of Parallel LZ77 and LZ78 Algorithms

Our project developed an linear work parallel LZ77 compression algorithm with unlimited window size. We also implemented two version of LZW compression algorithms. We did cross comparison of all algorithms and gave suggestions on how to choose an algorithm for real application.

متن کامل

A Low-Power CAM Design for LZ Data Compression

ÐLow-power and high-performance data compressors play an increasingly important role in the portable mobile computing and wireless communication markets. Among lossless data compression algorithms for hardware implementation, LZ77 is one of the most widely used. For real-time communication, some hardware LZ compressors/decompressors have been proposed in the past. Content addressable memory (CA...

متن کامل

Lempel-Ziv factorization: Simple, fast, practical

For decades the Lempel-Ziv (LZ77) factorization has been a cornerstone of data compression and string processing algorithms, and uses for it are still being uncovered. For example, LZ77 is central to several recent text indexing data structures designed to search highly repetitive collections. However, in many applications computation of the factorization remains a bottleneck in practice. In th...

متن کامل

Bicriteria data compression

The advent of massive datasets and the consequent design of high-performing distributed storage systems—such as BigTable by Google [7], Cassandra by Facebook [5], Hadoop by Apache—have reignited the interest of the scientific and engineering community towards the design of lossless data compressors which achieve effective compression ratio and very efficient decompression speed. Lempel-Ziv’s LZ...

متن کامل

Applying Compression to a Game's Network Protocol

This report presents the results of applying different compression algorithms to the network protocol of an online game. The algorithm implementations compared are zlib, liblzma and my own implementation based on LZ77 and a variation of adaptive Huffman coding. The comparison data was collected from the game TomeNET. The results show that adaptive coding is especially useful for compressing lar...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1708.03558  شماره 

صفحات  -

تاریخ انتشار 2017